Fast and accurate multi-class protein fold recognition with spatial sample kernels.

نویسندگان

  • Pavel Kuksa
  • Pai-Hsi Huang
  • Vladimir Pavlovic
چکیده

Establishing structural or functional relationship between sequences, for instance to infer the structural class of an unannotated protein, is a key task in biological sequence analysis. Recent computational methods such as profile and neighborhood mismatch kernels have shown very promising results for protein sequence classification, at the cost of high computational complexity. In this study we address the multi-class sequence classification problems using a class of string-based kernels, the sparse spatial sample kernels (SSSK), that are both biologically motivated and efficient to compute. The proposed methods can work with very large databases of protein sequences and show substantial improvements in computing time over the existing methods. Application of the SSSK to the multi-class protein prediction problems (fold recognition and remote homology detection) yields significantly better performance than existing state-of-the-art algorithms.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A fast, large-scale learning method for protein sequence classification

Motivation: Establishing structural and functional relationships between sequences in the presence of only the primary sequence information is a key task in biological sequence analysis. This ability can be critical for tasks such as making inferences of the structural class of unannotated proteins when no secondary or tertiary structure is available. Recent computational methods based on profi...

متن کامل

A New Class of Spatial Covariance Functions Generated by Higher-order Kernels

Covariance functions and variograms play a fundamental role in exploratory analysis and statistical modelling of spatial and spatio-temporal datasets. In this paper, we construct a new class of spatial covariance functions using the Fourier transform of some higher-order kernels. Moreover, we extend this class of spatial covariance functions to the spatio-temporal setting using the idea used in...

متن کامل

Probabilistic multi-class multi-kernel learning: on protein fold recognition and remote homology detection

MOTIVATION The problems of protein fold recognition and remote homology detection have recently attracted a great deal of interest as they represent challenging multi-feature multi-class problems for which modern pattern recognition methods achieve only modest levels of performance. As with many pattern recognition problems, there are multiple feature spaces or groups of attributes available, s...

متن کامل

Learning Large Margin First Order Decision Lists for Multi-Class Classification

Inductive Logic Programming (ILP) systems have been successfully applied to solve binary classification problems. It remains an open question how an accurate solution to a multi-class problem can be obtained by using a logic based learning method. In this paper we present a novel logic based approach to solve challenging multi-class classification problems. Our technique is based on the use of ...

متن کامل

Multi-class protein fold recognition using support vector machines and neural networks

MOTIVATION Protein fold recognition is an important approach to structure discovery without relying on sequence similarity. We study this approach with new multi-class classification methods and examined many issues important for a practical recognition system. RESULTS Most current discriminative methods for protein fold prediction use the one-against-others method, which has the well-known '...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Computational systems bioinformatics. Computational Systems Bioinformatics Conference

دوره 7  شماره 

صفحات  -

تاریخ انتشار 2008